Context dependent word modeling for statistical machine translation using part-of-speech tags
نویسندگان
چکیده
Word based translation models in particular and phrase based translation models in general assume that a word in any context is equivalent to the same word in any other context. Yet, this is not always true. The words in a sentence are not generated independently. The usage of each word is strongly affected by its immediate neighboring words. The state-ofthe-art machine translation (MT) methods use words and phrases as basic modeling units. This paper introduces Context Dependent Words (CDWs) as the new basic translation units. The context classes are defined using Part-of-Speech (POS) tags. Experimental results using CDW based language models demonstrate encouraging improvements in the translation quality for the translation of dialectal Arabic to English. Analysis of the results reveals that improvements are mainly in fluency.
منابع مشابه
Allophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملStatistical Machine Translation of Parliamentary Proceedings Using Morpho-Syntactic Knowledge
This paper presents an overview of the University of Washington statistical machine translation system developed for the 2006 TCSTAR evaluation campaign. We use a statistical phrase-based system with multiple decoding passes and a log-linear probability model. Our main focus was on exploring the possibility of using morpho-syntactic knowledge (lemmas and part-of-speech tags) for word alignment,...
متن کاملRich Source-Side Context for Statistical Machine Translation
We explore the augmentation of statistical machine translation models with features of the context of each phrase to be translated. This work extends several existing threads of research in statistical MT, including the use of context in example-based machine translation (Carl and Way, 2003) and the incorporation of word sense disambiguation into a translation model (Chan et al., 2007). The con...
متن کاملStatistical Machine Translation with Local Language Models
Part-of-speech language modeling is commonly used as a component in statistical machine translation systems, but there is mixed evidence that its usage leads to significant improvements. We argue that its limited effectiveness is due to the lack of lexicalization. We introduce a new approach that builds a separate local language model for each word and part-of-speech pair. The resulting models ...
متن کاملClustered Word Classes for Preordering in Statistical Machine Translation
Clustered word classes have been used in connection with statistical machine translation, for instance for improving word alignments. In this work we investigate if clustered word classes can be used in a preordering strategy, where the source language is reordered prior to training and translation. Part-of-speech tagging has previously been successfully used for learning reordering rules that ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2007